Code of America - NDoCH 2020

Code for Sacramento - Asset Map

Change Log

  • 08-21-2020: Baseline Version v0.1

Introduction

This project focuses on the asset map item for the CFA NDoCH event; it consists of a web application to visualize various assets within the Sacramento area. The data is sourced from various open data portals.

Data processing and maps are created using Jupyter Notebook due to its ability to visualize results effectively and efficiently. Listed below are installation instructions and more about Jupyter and the Python programming language.

Instructions

The CFA-NDoCH instructions are shown below and specify that open data sources should be used to visualize resources available to the Sacramento community. This notebook is intended as starting point to visualize such data for further development.

Asset mapping is an integral part of empowered community building that is based on understanding the strengths and needs of diverse communities. First, use publicly available information about your locale to give a sense of the landscape and demographics. Next, research the location and availability of government programs (e.g. county health and human services offices), community based organizations (like resource centers, food banks, and legal aid clinics) or other resources that are vital to your community. Visually documenting the landscape can help identify what might make your community more equitable and accessible to all who live there.

Approach

This notebook starts with a tutorial using Python mapping tools as a prototype, then develops a map for the Sacramento area. Open data sources are listed below and will be added to with additional development.

Tutorials

  1. Folium Tutorial: SFPD Crime Data
  2. SFPD Crime Report Data (2003-18)
  3. Folium Example: Cluster Markers
  4. Folium Tutorial: Cluster Markers (JPYTR)
  5. Folium Tutorial: Shapefile Data (Medium)

CA Geoportal Data

  1. CA Geoportal: Open Datasets
  2. CA School Locations (2019-20)
  3. CA Healthcare Shortage (Primary Care)
  4. CA Healthcare Shortage (Mental Health)
  5. CA Healthcare Shortage (Dental Care)

SACOG Data

  1. Sac Region LIHM Communities (2016)
  2. Sac Region High Transit Frequency Areas (2020)
  3. Sac Region High Quality Transit Areas (2017)

City of Sacramento Data

  1. City of Sacramento Bike Facilities (2018)
  2. City of Sacramento Bike Opportunity Areas (2016)

Jupyter Installation

  1. Download and install Jupyter Notebook from their website
  2. Verify that Jupyter Notebook was installed and visible from Windows Start menu
  3. Start Jupyter Notebook; it will start CMD shell and load in the web browser
  4. Save this notebook and CSV data to your "Documents" folder and navigate to it from the Notebook start page
  5. Open this notebook from the start page; file and cells should be viewable

Jupyter Introduction

This notebook will require some basic understanding of the Python programming language, Jupyter platform and data analysis concepts.

Jupyter is a powerful collaborative tool which is open-source and light-weight. It provides all the tools necessary to run data analysis, visualization, statistics and data science out of the box. In addition, it has gain acceptance from industry and academia for collaborating on projects and publishing work.

Jupyter is a combination of text and code with the programming run-time built into the platform so there is no need to install additional software. The text is in the markdown file format (similar to HTML), and code in several languages. It is organized by cells which can consist of either text or code; placed together, they can be sent as a single document to share/publish work.

Jupyter Notebook

Notebooks are organized by cells, which mainly consist of text (in markdown) and code (Python). It operations like a hybrid between MS Word and Excel file; whereas the entire file is like a document, the cells operate like a spreadsheet. For getting started, feel free to scroll down each cell and navigate around the cells for a quick tour. Here is a breakdown of how to view/edit cells:

  1. Each cell may be edited by hitting ENTER; toggle between cells using the arrow keys or mouse/scroller
  2. When editing a cell, be sure to select "markdown" for text or "code" before writing into it
  3. Each cell can be run by hitting CTRL + ENTER or the "run" button form the menu bar
  4. Output from each cell will appear below; if an error occurs, please read and try to debug it(!)
  5. File can be saved by hitting CTRL + "s" or file/save from the pulldown menu above

Quick Start

This notebook will require some Python programming, which is widely used and gain enough traction to be taught in high school and AP Computer Science courses.

Jupyter supports several different languages (R, Scala and Julia); however, Python is the most popular of them and can be used for other tasks, primarily data science and web programming.

Exercises

If you are new to Jupyter, then please review the links below:

  1. Intro Guide (DataQuest)
  2. Intro Guide (DataCamp)
  3. Notebook Intro (Medium)
  4. Intro Guide (Jupyter)

If you are new to programming or Python, then please review the links below:

  1. Quick Start (Official)
  2. Intro Tutorials (Real Python)
  3. Quick Start (FCC)

If you are new to programming or Markdown, then please review the links below:

  1. Quick Start (Github)
  2. Quick Start Guide (Markdown)
  3. Quick Start Tutorial (Markdown)
In [106]:
# 01 - load modules into notebook

# data analysis module
import pandas as pd

# numerical data modules
import numpy as np
import scipy

# data visualization module
import matplotlib.pyplot as plt

# adjust plot settings
%matplotlib inline

# data visualization module
# https://seaborn.pydata.org/
# import seaborn as sns; sns.set(color_codes=True)

# geospatial modules
from shapely.geometry import Point, Polygon
from shapely.geometry import shape, LineString, Point
import geopandas as gpd
import geojsonio
from descartes import PolygonPatch
import fiona

# geospatial and geojson modules
import folium
from folium.plugins import MarkerCluster
import os
import json

# install pip package in current kernel; run only for initial install:
# https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/
# import sys
# !{sys.executable} -m pip install --upgrade pip
# !{sys.executable} -m pip install seaborn==0.9.0
In [107]:
# 02.00 - data functions

# function to read csv file
# https://stackoverflow.com/questions/32400867/pandas-read-csv-from-url/41880513#41880513
def read_data(path):
    df = pd.read_csv(path)
    return(df)

# function to output csv file
def output_result(df, filepath):
    df.to_csv(filepath)

# function to show table info
def data_profile(df, msg):
    # pass in variable into string
    # https://stackoverflow.com/questions/2960772/how-do-i-put-a-variable-inside-a-string
    print('*** Table Info: %s ***' % msg, '\n')
    print(df.info(), '\n')
    print('*** Table Info: Table Dimensions ***', '\n')
    print(df.shape, '\n')

# function to show unique value for given column
def show_unique(df, col):
    # pass in variable into string
    # https://stackoverflow.com/questions/2960772/how-do-i-put-a-variable-inside-a-string
    print('*** Unique Values: (%s) ***' % col, '\n')
    print(df[col].unique(), '\n')

# function to output summary stats
def summary_stats(df, col):
    # pass in variable into string
    # https://stackoverflow.com/questions/2960772/how-do-i-put-a-variable-inside-a-string
    print('*** Summary Stats: (%s) ***' % col, '\n')
    print(df[col].describe(), '\n')
    # print(col.describe())

# function to rename columns
# https://www.geeksforgeeks.org/how-to-rename-columns-in-pandas-dataframe/
def rename_col(df, old_col, new_col):
    df.rename(
        columns={old_col:new_col},
        inplace=True
    )
    return df

# function convert col to numeric type
# reference: https://stackoverflow.com/questions/47333227/pandas-valueerror-cannot-convert-float-nan-to-integer
def convert_num(df, col):
    # convert type
    df[col] = pd.to_numeric(
        df[col],
        errors='coerce'
    )
    return(df)

# convert string to datetime
# reference: https://stackoverflow.com/questions/32888124/pandas-out-of-bounds-nanosecond-timestamp-after-offset-rollforward-plus-adding-a
def convert_date(df, col):
    # convert type
    df[col] = pd.to_datetime(
        df[col],
        infer_datetime_format=True,
        errors = 'coerce'
    )
    return(df)

# function convert col to string type
def convert_str(df, col):
    # convert type
    df[col].astype(str)
    return(df)
In [108]:
# 02.01 - data import

# sf open data portal - sfpd reports (2003-2018)
# https://data.sfgov.org/Public-Safety/Police-Department-Incident-Reports-Historical-2003/tmnf-yvry
# df_data = read_data("data/sfpd_report_2003-18.csv")

# note: reduce original file (500mb) by subset first 10k rows and replace file
# https://datacarpentry.org/python-ecology-lesson/03-index-slice-subset/index.html
# df_data = df_data[0:10000]
# output_result(df_data, "data/sfpd_report_2003-18.csv")

# read in reduced file after processing steps above
# https://data.sfgov.org/Public-Safety/Police-Department-Incident-Reports-Historical-2003/tmnf-yvry
df_sfpd = read_data("data/sfpd_report_2003-18.csv")

# ca geoportal - education dataset (2019-20)
# https://gis.data.ca.gov/datasets/CDEGIS::california-schools-2019-20
df_school = read_data("data/ca_school_2019-20.csv")
In [109]:
# 02.02 - data processing

# subset dataset by row values; for example, schools by count
# https://stackoverflow.com/questions/17071871/how-to-select-rows-from-a-dataframe-based-on-column-values
df_school_sac = df_school[
    df_school['CountyName'].str.contains('Sacramento')
]
df_school_amador = df_school[
    df_school['CountyName'].str.contains('Amador')
]
df_school_placer = df_school[
    df_school['CountyName'].str.contains('Placer')
]
df_school_yolo = df_school[
    df_school['CountyName'].str.contains('Yolo')
]
df_school_yuba = df_school[
    df_school['CountyName'].str.contains('Yuba')
]

# todo: process geojson
# https://opendata.arcgis.com/datasets/f7f818b0aa7a415192eaf66f192bc9cc_0.geojson
# df_school_geojson = read_data("data/ca_school_2019-20.geojson")

# data profile data after import
data_profile(df_sfpd, 'SFPD Reports (2003-18)')
data_profile(df_school, 'CA Schools (2019-20)')
data_profile(df_school_sac, 'CA Schools: Sacramento County (2019-20)')
data_profile(df_school_amador, 'CA Schools: Amador County (2019-20)')
data_profile(df_school_placer, 'CA Schools: Placer County (2019-20)')
data_profile(df_school_yolo, 'CA Schools: Yolo County (2019-20)')
data_profile(df_school_yuba, 'CA Schools: Yuba County (2019-20)')
*** Table Info: SFPD Reports (2003-18) *** 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 36 columns):
 #   Column                                                    Non-Null Count  Dtype  
---  ------                                                    --------------  -----  
 0   Unnamed: 0                                                10000 non-null  int64  
 1   PdId                                                      10000 non-null  int64  
 2   IncidntNum                                                10000 non-null  int64  
 3   Incident Code                                             10000 non-null  int64  
 4   Category                                                  10000 non-null  object 
 5   Descript                                                  10000 non-null  object 
 6   DayOfWeek                                                 10000 non-null  object 
 7   Date                                                      10000 non-null  object 
 8   Time                                                      10000 non-null  object 
 9   PdDistrict                                                10000 non-null  object 
 10  Resolution                                                10000 non-null  object 
 11  Address                                                   10000 non-null  object 
 12  X                                                         10000 non-null  float64
 13  Y                                                         10000 non-null  float64
 14  location                                                  10000 non-null  object 
 15  SF Find Neighborhoods 2 2                                 9966 non-null   float64
 16  Current Police Districts 2 2                              9992 non-null   float64
 17  Current Supervisor Districts 2 2                          9998 non-null   float64
 18  Analysis Neighborhoods 2 2                                9994 non-null   float64
 19  DELETE - Fire Prevention Districts 2 2                    9975 non-null   float64
 20  DELETE - Police Districts 2 2                             9991 non-null   float64
 21  DELETE - Supervisor Districts 2 2                         9998 non-null   float64
 22  DELETE - Zip Codes 2 2                                    9999 non-null   float64
 23  DELETE - Neighborhoods 2 2                                9994 non-null   float64
 24  DELETE - 2017 Fix It Zones 2 2                            2630 non-null   float64
 25  Civic Center Harm Reduction Project Boundary 2 2          1449 non-null   float64
 26  Fix It Zones as of 2017-11-06  2 2                        2705 non-null   float64
 27  DELETE - HSOC Zones 2 2                                   2146 non-null   float64
 28  Fix It Zones as of 2018-02-07 2 2                         2737 non-null   float64
 29  CBD, BID and GBD Boundaries as of 2017 2 2                2668 non-null   float64
 30  Areas of Vulnerability, 2016 2 2                          9994 non-null   float64
 31  Central Market/Tenderloin Boundary 2 2                    1613 non-null   float64
 32  Central Market/Tenderloin Boundary Polygon - Updated 2 2  1614 non-null   float64
 33  HSOC Zones as of 2018-06-05 2 2                           2293 non-null   float64
 34  OWED Public Spaces 2 2                                    520 non-null    float64
 35  Neighborhoods 2                                           9966 non-null   float64
dtypes: float64(23), int64(4), object(9)
memory usage: 2.7+ MB
None 

*** Table Info: Table Dimensions *** 

(10000, 36) 

*** Table Info: CA Schools (2019-20) *** 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10039 entries, 0 to 10038
Data columns (total 84 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   X                  10039 non-null  float64
 1   Y                  10039 non-null  float64
 2   OBJECTID           10039 non-null  int64  
 3   Year               10039 non-null  object 
 4   FedID              10039 non-null  int64  
 5   CDSCode            10039 non-null  int64  
 6   CDCode             10039 non-null  int64  
 7   SCode              10039 non-null  int64  
 8   CountyName         10039 non-null  object 
 9   DistrictName       10039 non-null  object 
 10  SchoolName         10039 non-null  object 
 11  SchoolType         10039 non-null  object 
 12  Status             10039 non-null  object 
 13  OpenDate           9980 non-null   object 
 14  ClosedDate         0 non-null      float64
 15  SchoolLevel        10039 non-null  object 
 16  GradeLow           10031 non-null  object 
 17  GradeHigh          10038 non-null  object 
 18  Charter            10039 non-null  object 
 19  CharterNum         1302 non-null   object 
 20  FundingType        1302 non-null   object 
 21  Virtual            10039 non-null  object 
 22  Magnet             10028 non-null  object 
 23  TitleIStatus       0 non-null      float64
 24  DASS               0 non-null      float64
 25  AssistStatusESSA   9945 non-null   object 
 26  Street             10039 non-null  object 
 27  City               10039 non-null  object 
 28  Zip                10039 non-null  object 
 29  State              10039 non-null  object 
 30  CongUS             10038 non-null  float64
 31  SenateCA           10038 non-null  float64
 32  AssemblyCA         10038 non-null  float64
 33  Locale             10038 non-null  object 
 34  Latitude           10039 non-null  float64
 35  Longitude          10039 non-null  float64
 36  LocSource          10039 non-null  object 
 37  EnrollTotal        10039 non-null  int64  
 38  AAcount            10039 non-null  int64  
 39  AApct              10039 non-null  float64
 40  AIcount            10039 non-null  int64  
 41  AIpct              10039 non-null  float64
 42  AScount            10039 non-null  int64  
 43  ASpct              10039 non-null  float64
 44  FIcount            10039 non-null  int64  
 45  FIpct              10039 non-null  float64
 46  HIcount            10039 non-null  int64  
 47  HIpct              10039 non-null  float64
 48  PIcount            10039 non-null  int64  
 49  PIpct              10039 non-null  float64
 50  WHcount            10039 non-null  int64  
 51  WHpct              10039 non-null  float64
 52  MRcount            10039 non-null  int64  
 53  MRpct              10039 non-null  float64
 54  NRcount            10039 non-null  int64  
 55  NRpct              10039 non-null  float64
 56  ELcount            10039 non-null  int64  
 57  ELpct              10039 non-null  float64
 58  FOScount           10039 non-null  int64  
 59  FOSpct             10039 non-null  float64
 60  HOMcount           0 non-null      float64
 61  HOMpct             0 non-null      float64
 62  MIGcount           10039 non-null  int64  
 63  MIGpct             10039 non-null  float64
 64  SEDCount           0 non-null      float64
 65  SEDpct             0 non-null      float64
 66  SWDcount           0 non-null      float64
 67  SWDpct             0 non-null      float64
 68  UPcount            10039 non-null  int64  
 69  UPpct              10039 non-null  float64
 70  ELATested          0 non-null      float64
 71  ELAStdMetPct       0 non-null      float64
 72  MathTested         0 non-null      float64
 73  MathStdMetPct      0 non-null      float64
 74  CCPrepCohortCount  0 non-null      float64
 75  CCPrepPct          0 non-null      float64
 76  AbsentEligCount    0 non-null      float64
 77  AbsentPct          0 non-null      float64
 78  GradCohortCount    0 non-null      float64
 79  GradPct            0 non-null      float64
 80  UCCSUReqMetPct     0 non-null      float64
 81  DropOutPct         0 non-null      float64
 82  EnrollCumulative   0 non-null      float64
 83  SuspPct            0 non-null      float64
dtypes: float64(43), int64(19), object(22)
memory usage: 6.4+ MB
None 

*** Table Info: Table Dimensions *** 

(10039, 84) 

*** Table Info: CA Schools: Sacramento County (2019-20) *** 

<class 'pandas.core.frame.DataFrame'>
Int64Index: 380 entries, 5841 to 6220
Data columns (total 84 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   X                  380 non-null    float64
 1   Y                  380 non-null    float64
 2   OBJECTID           380 non-null    int64  
 3   Year               380 non-null    object 
 4   FedID              380 non-null    int64  
 5   CDSCode            380 non-null    int64  
 6   CDCode             380 non-null    int64  
 7   SCode              380 non-null    int64  
 8   CountyName         380 non-null    object 
 9   DistrictName       380 non-null    object 
 10  SchoolName         380 non-null    object 
 11  SchoolType         380 non-null    object 
 12  Status             380 non-null    object 
 13  OpenDate           380 non-null    object 
 14  ClosedDate         0 non-null      float64
 15  SchoolLevel        380 non-null    object 
 16  GradeLow           380 non-null    object 
 17  GradeHigh          379 non-null    object 
 18  Charter            380 non-null    object 
 19  CharterNum         50 non-null     object 
 20  FundingType        50 non-null     object 
 21  Virtual            380 non-null    object 
 22  Magnet             380 non-null    object 
 23  TitleIStatus       0 non-null      float64
 24  DASS               0 non-null      float64
 25  AssistStatusESSA   375 non-null    object 
 26  Street             380 non-null    object 
 27  City               380 non-null    object 
 28  Zip                380 non-null    object 
 29  State              380 non-null    object 
 30  CongUS             380 non-null    float64
 31  SenateCA           380 non-null    float64
 32  AssemblyCA         380 non-null    float64
 33  Locale             380 non-null    object 
 34  Latitude           380 non-null    float64
 35  Longitude          380 non-null    float64
 36  LocSource          380 non-null    object 
 37  EnrollTotal        380 non-null    int64  
 38  AAcount            380 non-null    int64  
 39  AApct              380 non-null    float64
 40  AIcount            380 non-null    int64  
 41  AIpct              380 non-null    float64
 42  AScount            380 non-null    int64  
 43  ASpct              380 non-null    float64
 44  FIcount            380 non-null    int64  
 45  FIpct              380 non-null    float64
 46  HIcount            380 non-null    int64  
 47  HIpct              380 non-null    float64
 48  PIcount            380 non-null    int64  
 49  PIpct              380 non-null    float64
 50  WHcount            380 non-null    int64  
 51  WHpct              380 non-null    float64
 52  MRcount            380 non-null    int64  
 53  MRpct              380 non-null    float64
 54  NRcount            380 non-null    int64  
 55  NRpct              380 non-null    float64
 56  ELcount            380 non-null    int64  
 57  ELpct              380 non-null    float64
 58  FOScount           380 non-null    int64  
 59  FOSpct             380 non-null    float64
 60  HOMcount           0 non-null      float64
 61  HOMpct             0 non-null      float64
 62  MIGcount           380 non-null    int64  
 63  MIGpct             380 non-null    float64
 64  SEDCount           0 non-null      float64
 65  SEDpct             0 non-null      float64
 66  SWDcount           0 non-null      float64
 67  SWDpct             0 non-null      float64
 68  UPcount            380 non-null    int64  
 69  UPpct              380 non-null    float64
 70  ELATested          0 non-null      float64
 71  ELAStdMetPct       0 non-null      float64
 72  MathTested         0 non-null      float64
 73  MathStdMetPct      0 non-null      float64
 74  CCPrepCohortCount  0 non-null      float64
 75  CCPrepPct          0 non-null      float64
 76  AbsentEligCount    0 non-null      float64
 77  AbsentPct          0 non-null      float64
 78  GradCohortCount    0 non-null      float64
 79  GradPct            0 non-null      float64
 80  UCCSUReqMetPct     0 non-null      float64
 81  DropOutPct         0 non-null      float64
 82  EnrollCumulative   0 non-null      float64
 83  SuspPct            0 non-null      float64
dtypes: float64(43), int64(19), object(22)
memory usage: 252.3+ KB
None 

*** Table Info: Table Dimensions *** 

(380, 84) 

*** Table Info: CA Schools: Amador County (2019-20) *** 

<class 'pandas.core.frame.DataFrame'>
Int64Index: 14 entries, 383 to 396
Data columns (total 84 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   X                  14 non-null     float64
 1   Y                  14 non-null     float64
 2   OBJECTID           14 non-null     int64  
 3   Year               14 non-null     object 
 4   FedID              14 non-null     int64  
 5   CDSCode            14 non-null     int64  
 6   CDCode             14 non-null     int64  
 7   SCode              14 non-null     int64  
 8   CountyName         14 non-null     object 
 9   DistrictName       14 non-null     object 
 10  SchoolName         14 non-null     object 
 11  SchoolType         14 non-null     object 
 12  Status             14 non-null     object 
 13  OpenDate           14 non-null     object 
 14  ClosedDate         0 non-null      float64
 15  SchoolLevel        14 non-null     object 
 16  GradeLow           14 non-null     object 
 17  GradeHigh          14 non-null     object 
 18  Charter            14 non-null     object 
 19  CharterNum         0 non-null      object 
 20  FundingType        0 non-null      object 
 21  Virtual            14 non-null     object 
 22  Magnet             14 non-null     object 
 23  TitleIStatus       0 non-null      float64
 24  DASS               0 non-null      float64
 25  AssistStatusESSA   14 non-null     object 
 26  Street             14 non-null     object 
 27  City               14 non-null     object 
 28  Zip                14 non-null     object 
 29  State              14 non-null     object 
 30  CongUS             14 non-null     float64
 31  SenateCA           14 non-null     float64
 32  AssemblyCA         14 non-null     float64
 33  Locale             14 non-null     object 
 34  Latitude           14 non-null     float64
 35  Longitude          14 non-null     float64
 36  LocSource          14 non-null     object 
 37  EnrollTotal        14 non-null     int64  
 38  AAcount            14 non-null     int64  
 39  AApct              14 non-null     float64
 40  AIcount            14 non-null     int64  
 41  AIpct              14 non-null     float64
 42  AScount            14 non-null     int64  
 43  ASpct              14 non-null     float64
 44  FIcount            14 non-null     int64  
 45  FIpct              14 non-null     float64
 46  HIcount            14 non-null     int64  
 47  HIpct              14 non-null     float64
 48  PIcount            14 non-null     int64  
 49  PIpct              14 non-null     float64
 50  WHcount            14 non-null     int64  
 51  WHpct              14 non-null     float64
 52  MRcount            14 non-null     int64  
 53  MRpct              14 non-null     float64
 54  NRcount            14 non-null     int64  
 55  NRpct              14 non-null     float64
 56  ELcount            14 non-null     int64  
 57  ELpct              14 non-null     float64
 58  FOScount           14 non-null     int64  
 59  FOSpct             14 non-null     float64
 60  HOMcount           0 non-null      float64
 61  HOMpct             0 non-null      float64
 62  MIGcount           14 non-null     int64  
 63  MIGpct             14 non-null     float64
 64  SEDCount           0 non-null      float64
 65  SEDpct             0 non-null      float64
 66  SWDcount           0 non-null      float64
 67  SWDpct             0 non-null      float64
 68  UPcount            14 non-null     int64  
 69  UPpct              14 non-null     float64
 70  ELATested          0 non-null      float64
 71  ELAStdMetPct       0 non-null      float64
 72  MathTested         0 non-null      float64
 73  MathStdMetPct      0 non-null      float64
 74  CCPrepCohortCount  0 non-null      float64
 75  CCPrepPct          0 non-null      float64
 76  AbsentEligCount    0 non-null      float64
 77  AbsentPct          0 non-null      float64
 78  GradCohortCount    0 non-null      float64
 79  GradPct            0 non-null      float64
 80  UCCSUReqMetPct     0 non-null      float64
 81  DropOutPct         0 non-null      float64
 82  EnrollCumulative   0 non-null      float64
 83  SuspPct            0 non-null      float64
dtypes: float64(43), int64(19), object(22)
memory usage: 9.3+ KB
None 

*** Table Info: Table Dimensions *** 

(14, 84) 

*** Table Info: CA Schools: Placer County (2019-20) *** 

<class 'pandas.core.frame.DataFrame'>
Int64Index: 126 entries, 5206 to 5331
Data columns (total 84 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   X                  126 non-null    float64
 1   Y                  126 non-null    float64
 2   OBJECTID           126 non-null    int64  
 3   Year               126 non-null    object 
 4   FedID              126 non-null    int64  
 5   CDSCode            126 non-null    int64  
 6   CDCode             126 non-null    int64  
 7   SCode              126 non-null    int64  
 8   CountyName         126 non-null    object 
 9   DistrictName       126 non-null    object 
 10  SchoolName         126 non-null    object 
 11  SchoolType         126 non-null    object 
 12  Status             126 non-null    object 
 13  OpenDate           126 non-null    object 
 14  ClosedDate         0 non-null      float64
 15  SchoolLevel        126 non-null    object 
 16  GradeLow           126 non-null    object 
 17  GradeHigh          126 non-null    object 
 18  Charter            126 non-null    object 
 19  CharterNum         20 non-null     object 
 20  FundingType        20 non-null     object 
 21  Virtual            126 non-null    object 
 22  Magnet             126 non-null    object 
 23  TitleIStatus       0 non-null      float64
 24  DASS               0 non-null      float64
 25  AssistStatusESSA   125 non-null    object 
 26  Street             126 non-null    object 
 27  City               126 non-null    object 
 28  Zip                126 non-null    object 
 29  State              126 non-null    object 
 30  CongUS             126 non-null    float64
 31  SenateCA           126 non-null    float64
 32  AssemblyCA         126 non-null    float64
 33  Locale             126 non-null    object 
 34  Latitude           126 non-null    float64
 35  Longitude          126 non-null    float64
 36  LocSource          126 non-null    object 
 37  EnrollTotal        126 non-null    int64  
 38  AAcount            126 non-null    int64  
 39  AApct              126 non-null    float64
 40  AIcount            126 non-null    int64  
 41  AIpct              126 non-null    float64
 42  AScount            126 non-null    int64  
 43  ASpct              126 non-null    float64
 44  FIcount            126 non-null    int64  
 45  FIpct              126 non-null    float64
 46  HIcount            126 non-null    int64  
 47  HIpct              126 non-null    float64
 48  PIcount            126 non-null    int64  
 49  PIpct              126 non-null    float64
 50  WHcount            126 non-null    int64  
 51  WHpct              126 non-null    float64
 52  MRcount            126 non-null    int64  
 53  MRpct              126 non-null    float64
 54  NRcount            126 non-null    int64  
 55  NRpct              126 non-null    float64
 56  ELcount            126 non-null    int64  
 57  ELpct              126 non-null    float64
 58  FOScount           126 non-null    int64  
 59  FOSpct             126 non-null    float64
 60  HOMcount           0 non-null      float64
 61  HOMpct             0 non-null      float64
 62  MIGcount           126 non-null    int64  
 63  MIGpct             126 non-null    float64
 64  SEDCount           0 non-null      float64
 65  SEDpct             0 non-null      float64
 66  SWDcount           0 non-null      float64
 67  SWDpct             0 non-null      float64
 68  UPcount            126 non-null    int64  
 69  UPpct              126 non-null    float64
 70  ELATested          0 non-null      float64
 71  ELAStdMetPct       0 non-null      float64
 72  MathTested         0 non-null      float64
 73  MathStdMetPct      0 non-null      float64
 74  CCPrepCohortCount  0 non-null      float64
 75  CCPrepPct          0 non-null      float64
 76  AbsentEligCount    0 non-null      float64
 77  AbsentPct          0 non-null      float64
 78  GradCohortCount    0 non-null      float64
 79  GradPct            0 non-null      float64
 80  UCCSUReqMetPct     0 non-null      float64
 81  DropOutPct         0 non-null      float64
 82  EnrollCumulative   0 non-null      float64
 83  SuspPct            0 non-null      float64
dtypes: float64(43), int64(19), object(22)
memory usage: 83.7+ KB
None 

*** Table Info: Table Dimensions *** 

(126, 84) 

*** Table Info: CA Schools: Yolo County (2019-20) *** 

<class 'pandas.core.frame.DataFrame'>
Int64Index: 62 entries, 9937 to 9998
Data columns (total 84 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   X                  62 non-null     float64
 1   Y                  62 non-null     float64
 2   OBJECTID           62 non-null     int64  
 3   Year               62 non-null     object 
 4   FedID              62 non-null     int64  
 5   CDSCode            62 non-null     int64  
 6   CDCode             62 non-null     int64  
 7   SCode              62 non-null     int64  
 8   CountyName         62 non-null     object 
 9   DistrictName       62 non-null     object 
 10  SchoolName         62 non-null     object 
 11  SchoolType         62 non-null     object 
 12  Status             62 non-null     object 
 13  OpenDate           62 non-null     object 
 14  ClosedDate         0 non-null      float64
 15  SchoolLevel        62 non-null     object 
 16  GradeLow           62 non-null     object 
 17  GradeHigh          62 non-null     object 
 18  Charter            62 non-null     object 
 19  CharterNum         7 non-null      object 
 20  FundingType        7 non-null      object 
 21  Virtual            62 non-null     object 
 22  Magnet             62 non-null     object 
 23  TitleIStatus       0 non-null      float64
 24  DASS               0 non-null      float64
 25  AssistStatusESSA   60 non-null     object 
 26  Street             62 non-null     object 
 27  City               62 non-null     object 
 28  Zip                62 non-null     object 
 29  State              62 non-null     object 
 30  CongUS             62 non-null     float64
 31  SenateCA           62 non-null     float64
 32  AssemblyCA         62 non-null     float64
 33  Locale             62 non-null     object 
 34  Latitude           62 non-null     float64
 35  Longitude          62 non-null     float64
 36  LocSource          62 non-null     object 
 37  EnrollTotal        62 non-null     int64  
 38  AAcount            62 non-null     int64  
 39  AApct              62 non-null     float64
 40  AIcount            62 non-null     int64  
 41  AIpct              62 non-null     float64
 42  AScount            62 non-null     int64  
 43  ASpct              62 non-null     float64
 44  FIcount            62 non-null     int64  
 45  FIpct              62 non-null     float64
 46  HIcount            62 non-null     int64  
 47  HIpct              62 non-null     float64
 48  PIcount            62 non-null     int64  
 49  PIpct              62 non-null     float64
 50  WHcount            62 non-null     int64  
 51  WHpct              62 non-null     float64
 52  MRcount            62 non-null     int64  
 53  MRpct              62 non-null     float64
 54  NRcount            62 non-null     int64  
 55  NRpct              62 non-null     float64
 56  ELcount            62 non-null     int64  
 57  ELpct              62 non-null     float64
 58  FOScount           62 non-null     int64  
 59  FOSpct             62 non-null     float64
 60  HOMcount           0 non-null      float64
 61  HOMpct             0 non-null      float64
 62  MIGcount           62 non-null     int64  
 63  MIGpct             62 non-null     float64
 64  SEDCount           0 non-null      float64
 65  SEDpct             0 non-null      float64
 66  SWDcount           0 non-null      float64
 67  SWDpct             0 non-null      float64
 68  UPcount            62 non-null     int64  
 69  UPpct              62 non-null     float64
 70  ELATested          0 non-null      float64
 71  ELAStdMetPct       0 non-null      float64
 72  MathTested         0 non-null      float64
 73  MathStdMetPct      0 non-null      float64
 74  CCPrepCohortCount  0 non-null      float64
 75  CCPrepPct          0 non-null      float64
 76  AbsentEligCount    0 non-null      float64
 77  AbsentPct          0 non-null      float64
 78  GradCohortCount    0 non-null      float64
 79  GradPct            0 non-null      float64
 80  UCCSUReqMetPct     0 non-null      float64
 81  DropOutPct         0 non-null      float64
 82  EnrollCumulative   0 non-null      float64
 83  SuspPct            0 non-null      float64
dtypes: float64(43), int64(19), object(22)
memory usage: 41.2+ KB
None 

*** Table Info: Table Dimensions *** 

(62, 84) 

*** Table Info: CA Schools: Yuba County (2019-20) *** 

<class 'pandas.core.frame.DataFrame'>
Int64Index: 40 entries, 9999 to 10038
Data columns (total 84 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   X                  40 non-null     float64
 1   Y                  40 non-null     float64
 2   OBJECTID           40 non-null     int64  
 3   Year               40 non-null     object 
 4   FedID              40 non-null     int64  
 5   CDSCode            40 non-null     int64  
 6   CDCode             40 non-null     int64  
 7   SCode              40 non-null     int64  
 8   CountyName         40 non-null     object 
 9   DistrictName       40 non-null     object 
 10  SchoolName         40 non-null     object 
 11  SchoolType         40 non-null     object 
 12  Status             40 non-null     object 
 13  OpenDate           40 non-null     object 
 14  ClosedDate         0 non-null      float64
 15  SchoolLevel        40 non-null     object 
 16  GradeLow           40 non-null     object 
 17  GradeHigh          40 non-null     object 
 18  Charter            40 non-null     object 
 19  CharterNum         6 non-null      object 
 20  FundingType        6 non-null      object 
 21  Virtual            40 non-null     object 
 22  Magnet             40 non-null     object 
 23  TitleIStatus       0 non-null      float64
 24  DASS               0 non-null      float64
 25  AssistStatusESSA   38 non-null     object 
 26  Street             40 non-null     object 
 27  City               40 non-null     object 
 28  Zip                40 non-null     object 
 29  State              40 non-null     object 
 30  CongUS             40 non-null     float64
 31  SenateCA           40 non-null     float64
 32  AssemblyCA         40 non-null     float64
 33  Locale             40 non-null     object 
 34  Latitude           40 non-null     float64
 35  Longitude          40 non-null     float64
 36  LocSource          40 non-null     object 
 37  EnrollTotal        40 non-null     int64  
 38  AAcount            40 non-null     int64  
 39  AApct              40 non-null     float64
 40  AIcount            40 non-null     int64  
 41  AIpct              40 non-null     float64
 42  AScount            40 non-null     int64  
 43  ASpct              40 non-null     float64
 44  FIcount            40 non-null     int64  
 45  FIpct              40 non-null     float64
 46  HIcount            40 non-null     int64  
 47  HIpct              40 non-null     float64
 48  PIcount            40 non-null     int64  
 49  PIpct              40 non-null     float64
 50  WHcount            40 non-null     int64  
 51  WHpct              40 non-null     float64
 52  MRcount            40 non-null     int64  
 53  MRpct              40 non-null     float64
 54  NRcount            40 non-null     int64  
 55  NRpct              40 non-null     float64
 56  ELcount            40 non-null     int64  
 57  ELpct              40 non-null     float64
 58  FOScount           40 non-null     int64  
 59  FOSpct             40 non-null     float64
 60  HOMcount           0 non-null      float64
 61  HOMpct             0 non-null      float64
 62  MIGcount           40 non-null     int64  
 63  MIGpct             40 non-null     float64
 64  SEDCount           0 non-null      float64
 65  SEDpct             0 non-null      float64
 66  SWDcount           0 non-null      float64
 67  SWDpct             0 non-null      float64
 68  UPcount            40 non-null     int64  
 69  UPpct              40 non-null     float64
 70  ELATested          0 non-null      float64
 71  ELAStdMetPct       0 non-null      float64
 72  MathTested         0 non-null      float64
 73  MathStdMetPct      0 non-null      float64
 74  CCPrepCohortCount  0 non-null      float64
 75  CCPrepPct          0 non-null      float64
 76  AbsentEligCount    0 non-null      float64
 77  AbsentPct          0 non-null      float64
 78  GradCohortCount    0 non-null      float64
 79  GradPct            0 non-null      float64
 80  UCCSUReqMetPct     0 non-null      float64
 81  DropOutPct         0 non-null      float64
 82  EnrollCumulative   0 non-null      float64
 83  SuspPct            0 non-null      float64
dtypes: float64(43), int64(19), object(22)
memory usage: 26.6+ KB
None 

*** Table Info: Table Dimensions *** 

(40, 84) 

In [110]:
# 03.00 - map functions

# tutorial - folium plot with cluster markers
# https://python-visualization.github.io/folium/quickstart.html
# https://www.jpytr.com/post/analysinggeographicdatawithfolium/
# https://github.com/python-visualization/folium/blob/master/examples/MarkerCluster.ipynb

# function to plot coordinates with cluster markers
def plot_cluster(col1, col2, icon_color, cluster_name, map):
    # zip lat/long into list
    location = list(zip(col1, col2))
    # icon = [folium.Icon(color='red') for _ in range(len(location_sac))]
    icon = [folium.Icon(color=icon_color) for _ in range(len(location))]
    # plot clusters
    cluster = MarkerCluster(
        # name='CA Schools: Sac County, 2019-20 (Red)',
        name=cluster_name,
        control=True,
        locations=location,
        icons=icon
    )
    map.add_child(cluster)
    return(map)

# function to add map controls and title
# https://stackoverflow.com/questions/37466683/create-a-legend-on-a-folium-map
# https://stackoverflow.com/questions/61928013/adding-a-title-or-text-to-a-folium-map
def plot_map(loc_title, file_path, map):
    # add legend and layer control
    map.add_child(folium.map.LayerControl())
    # add map title
    loc = loc_title
    title_html = '''
                 <h3 align="center" style="font-size:16px"><b>{}</b></h3>
                 '''.format(loc)
    map.get_root().html.add_child(folium.Element(title_html))
    # display and save map
    display(map)
    map.save(file_path)

# function to plot geojson
# https://medium.com/@rohanguptha.bompally/python-data-visualization-using-folium-and-geopandas-981857948f02
def plot_geojson(json_file, layer_title, style, map):
    # note: add json to map; however, geojson function only reads json
    # https://shallowsky.com/blog/mapping/folium-with-shapefiles.html
    folium.GeoJson(
        json_file,
        name=layer_title,
        control=True,
        style_function=lambda x:style
    ).add_to(map)
    return(map)

# function to import geojson, then convert to json
# https://github.com/lesley2958/twilio-geospatial
def geojson2json(file_path):
    # import geojson and view data source
    # https://raw.githubusercontent.com/lesley2958/twilio-geospatial/master/data/states.geojson
    sacog_lihm_geojson = gpd.read_file(file_path)
    # print(sacog_lihm_geojson.head(5), '\n')
    # convert to json
    sacog_lihm_json = sacog_lihm_geojson.to_json()
    # print(sacog_lihm_json)
    return(sacog_lihm_json)

# sacog - lihm areas (2016)
# https://data.sacog.org/datasets/d37cca2c798b48b9966b62e4bb1f380d_0?selectedAttribute=COUNTYFP10
sacog_lihm_json = geojson2json('data/sacog_lihm_areas_2016.geojson')

# city of sac - existing bike facilities (2018)
# http://data.cityofsacramento.org/datasets/15f8e048d9ad4442a3e12b6182bcd4f2_1?geometry=-121.899%2C38.464%2C-121.028%2C38.652
citysac_bike_fac_json = geojson2json('data/citysac_bike_fac_2018.geojson')

# city of sac - bikeshare opportunity areas (2016)
# http://data.cityofsacramento.org/datasets/8439c4e091a2434aafee1cf888b061f0_0?geometry=-122.330%2C38.373%2C-120.589%2C38.749
citysac_bikeshare_json = geojson2json('data/citysac_bikeshare_areas_2016.geojson')

# sacog - hfta-scs data (2020)
# http://data.sacog.org/datasets/high-frequency-transit-area-mtp-scs-2020
sacog_hfta_json = geojson2json('data/sacog_htfa_2020.geojson')

# sacog - hq-transit, sb375 data (2017)
# http://data.sacog.org/datasets/high-quality-transit-2036?geometry=-123.179%2C38.303%2C-119.697%2C39.053
sacog_sb375_json = geojson2json('data/sacog_sb375_2017.geojson')

# sacog - calenviroscreen3.0, top-25 tracks
# http://data.sacog.org/datasets/calenviroscreen-3-0-top-25-tracts?geometry=-123.212%2C38.343%2C-119.729%2C39.093
sacog_calenv_json = geojson2json('data/sacog_calenv_top25.geojson')

# sacog - air pollution, pm2.5 planning areas (2018)
# http://data.sacog.org/datasets/sacramento-pm-2-5-planning-area-
sacog_pm25_json = geojson2json('data/sacog_pm25_2018.geojson')
In [111]:
# 03.01 - map plot: sfpd tutorial

# note: module based on tutorial below
# https://blog.dominodatalab.com/creating-interactive-crime-maps-with-folium/

# sf open data portal - sfpd reports (2003-2018)
# https://data.sfgov.org/Public-Safety/Police-Department-Incident-Reports-Historical-2003/tmnf-yvry

# set origin
latlong_sf = (37.76, -122.45)

# create map
map_sfpd = folium.Map(location=latlong_sf, zoom_start=12)

# call function to plot coordinates with cluster markers
map_sfpd = plot_cluster(
    df_sfpd.Y,
    df_sfpd.X,
    'red',
    'SFPD: Crime Reports, 2003-2018 (Red)',
    map_sfpd
)

# call function to add map controls and title
plot_map(
    'Crime Report Map: City of San Francisco (2003-2018)',
    'maps/03.01_sfpd_reports.html',
    map_sfpd
)
In [112]:
# 03.02 - map plot: sac area with lihm areas and schools

# ca geoportal - education dataset (2019-20)
# https://gis.data.ca.gov/datasets/CDEGIS::california-schools-2019-20
# https://opendata.arcgis.com/datasets/f7f818b0aa7a415192eaf66f192bc9cc_0.geojson

# sacog - lihm shapefile (2016)
# https://data.sacog.org/datasets/d37cca2c798b48b9966b62e4bb1f380d_0

# set origin
# https://www.latlong.net/place/sacramento-ca-usa-1079.html
latlong_sac = (38.575764, -121.478851)

# create map
map_sac_lihm_school = folium.Map(location=latlong_sac, zoom_start=8)

# plot sacog lihm data
style_sac_lihm = {
    'line_opacity': 0.5
}
# call function to plot geojson
map_sac_lihm_school = plot_geojson(
    sacog_lihm_json,
    'SACOG: Low Income High Minority (LIHM) Communities, 2016 (Blue)',
    style_sac_lihm,
    map_sac_lihm_school
)

# call function to plot coordinates with cluster markers
map_sac_lihm_school = plot_cluster(
    df_school_sac.Latitude,
    df_school_sac.Longitude,
    'red',
    'CA Schools: Sac County, 2019-20 (Red)',
    map_sac_lihm_school
)
map_sac_lihm_school = plot_cluster(
    df_school_amador.Latitude,
    df_school_amador.Longitude,
    'green',
    'CA Schools: Amador County, 2019-20 (Green)',
    map_sac_lihm_school
)
map_sac_lihm_school = plot_cluster(
    df_school_placer.Latitude,
    df_school_placer.Longitude,
    'blue',
    'CA Schools: Placer County, 2019-20 (Blue)',
    map_sac_lihm_school
)
map_sac_lihm_school = plot_cluster(
    df_school_yolo.Latitude,
    df_school_yolo.Longitude,
    'orange',
    'CA Schools: Yolo County, 2019-20 (Oranage)',
    map_sac_lihm_school
)
map_sac_lihm_school = plot_cluster(
    df_school_yuba.Latitude,
    df_school_yuba.Longitude,
    'purple',
    'CA Schools: Yuba County, 2019-20 (Purple)',
    map_sac_lihm_school
)

# call function to add map controls and title
plot_map(
    'Sacramento Area Asset Map: LIHM Communities and Schools',
    'maps/03.02_sac_lihm_school.html',
    map_sac_lihm_school
)
In [113]:
# 03.03 - map plot: city of sac, lihm areas and bike facilities

# set origin
# https://www.latlong.net/place/sacramento-ca-usa-1079.html
latlong_sac = (38.575764, -121.478851)

# create map
map_sac_lihm_bike = folium.Map(location=latlong_sac, zoom_start=12)

# plot sacog lihm data
style_sac_lihm = {
    'fillColor': '#ff4500',
    'color': '#ff4500'
}
# call function to plot geojson
map_sac_lihm_bike = plot_geojson(
    sacog_lihm_json,
    'SACOG: Low Income High Minority (LIHM) Communities, 2016 (Orange)',
    style_sac_lihm,
    map_sac_lihm_bike
)

style_bike_fac = {
    'fillColor': '#008000',
    'color': '#008000'
}
map_sac_lihm_bike = plot_geojson(
    citysac_bike_fac_json,
    'City of Sac: Bike Facilities, 2018 (Green)',
    style_bike_fac,
    map_sac_lihm_bike
)

style_bikeshare = {
    'fillColor': '#9370db',
    'color': '#9370db'
}
map_sac_lihm_bike = plot_geojson(
    citysac_bikeshare_json,
    'City of Sac: Bikeshare Opportunity Areas, 2016 (Purple)',
    style_bikeshare,
    map_sac_lihm_bike
)

# call function to add map controls and title
plot_map(
    'City Sacramento Asset Map: LIHM Communities and Bike Facilities',
    'maps/03.03_sac_lihm_bike.html',
    map_sac_lihm_bike
)
In [114]:
# 03.04 - map plot: city of sac, lihm areas and public transit

# set origin
# https://www.latlong.net/place/sacramento-ca-usa-1079.html
latlong_sac = (38.575764, -121.478851)

# create map
map_sac_lihm_transit = folium.Map(location=latlong_sac, zoom_start=12)

# plot sacog lihm data
style_sac_lihm = {
    'fillColor': '#ff4500',
    'color': '#ff4500'
}
# call function to plot geojson
map_sac_lihm_transit = plot_geojson(
    sacog_lihm_json,
    'SACOG: Low Income High Minority (LIHM) Communities, 2016 (Orange)',
    style_sac_lihm,
    map_sac_lihm_transit
)

style_sac_hfta = {
    'fillColor': '#9370db',
    'color': '#9370db'
}
map_sac_lihm_transit = plot_geojson(
    sacog_hfta_json,
    'SACOG: High Frequency Transit Areas (HFTAs), 2020 (Purple)',
    style_sac_hfta,
    map_sac_lihm_transit
)

style_sac_sb375 = {
    'fillColor': '#008000',
    'color': '#008000'
}
map_sac_lihm_transit = plot_geojson(
    sacog_sb375_json,
    'SACOG: High Quality Transit (SB375), 2017 (Green)',
    style_sac_sb375,
    map_sac_lihm_transit
)

# call function to add map controls and title
plot_map(
    'City Sacramento Asset Map: LIHM Communities and Transit',
    'maps/03.04_sac_lihm_transit.html',
    map_sac_lihm_transit
)
In [115]:
# 03.05 - map plot: city of sac, lihm areas and pollution

# set origin
# https://www.latlong.net/place/sacramento-ca-usa-1079.html
latlong_sac = (38.575764, -121.478851)

# create map
map_sac_lihm_pollution = folium.Map(location=latlong_sac, zoom_start=12)

# plot sacog lihm data
style_sac_lihm = {
    'fillColor': '#ff4500',
    'color': '#ff4500'
}
# call function to plot geojson
map_sac_lihm_pollution = plot_geojson(
    sacog_lihm_json,
    'SACOG: Low Income High Minority (LIHM) Communities, 2016 (Orange)',
    style_sac_lihm,
    map_sac_lihm_pollution
)

style_sac_pm25 = {
    'fillColor': '#9370db',
    'color': '#9370db'
}
map_sac_lihm_pollution = plot_geojson(
    sacog_pm25_json,
    'SACOG: Air Pollution PM 2.5 Planning Areas, 2018 (Purple)',
    style_sac_pm25,
    map_sac_lihm_pollution
)

style_sac_calenv = {
    'fillColor': '#008000',
    'color': '#008000'
}
map_sac_lihm_pollution = plot_geojson(
    sacog_calenv_json,
    'SACOG: CalEnviroScreen3.0, Top 25% Tracks (Green)',
    style_sac_calenv,
    map_sac_lihm_pollution
)

# call function to add map controls and title
plot_map(
    'City Sacramento Asset Map: LIHM Communities and Pollution Levels',
    'maps/03.05_sac_lihm_pollution.html',
    map_sac_lihm_pollution
)